{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example shows how to collect paginated data from an ecommerce website using AgentQL with the [Google Colaboratory](https://colab.research.google.com/) environment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install [AgentQL](https://pypi.org/project/agentql/) library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install agentql"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install the Playwright dependencency required by AgentQL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!playwright install chromium"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can [store](https://medium.com/@parthdasawant/how-to-use-secrets-in-google-colab-450c38e3ec75) keys in Google Colab's secrets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from google.colab import userdata\n",
    "\n",
    "os.environ[\"AGENTQL_API_KEY\"]=userdata.get('AGENTQL_API_KEY')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run AgentQL script. Unfortunately, you can't see the browser window directly in Google Colab like you would on your local machine. However, let's still interact with it and take a screen recording of the browser session to see what’s happening. \n",
    "\n",
    "Please note that an online environment like Google Colab supports **asynchronous version** of AgentQL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import logging\n",
    "\n",
    "import agentql\n",
    "from playwright.async_api import async_playwright\n",
    "from IPython.display import HTML\n",
    "from base64 import b64encode\n",
    "\n",
    "logging.basicConfig(level=logging.DEBUG)\n",
    "log = logging.getLogger(__name__)\n",
    "\n",
    "async with async_playwright() as playwright, await playwright.chromium.launch() as browser:\n",
    "\n",
    "    # Set up the video recording\n",
    "    video_dir = os.path.abspath(\"videos\")\n",
    "    context = await browser.new_context(\n",
    "      record_video_dir=\"videos/\",\n",
    "      record_video_size={\"width\": 1280, \"height\": 720}\n",
    "      )\n",
    "    page = await agentql.wrap_async(await context.new_page())\n",
    "    await page.goto(\"https://books.toscrape.com/\")\n",
    "\n",
    "    # define the query to extract product names, prices, and ratings\n",
    "    QUERY = \"\"\"\n",
    "    {\n",
    "            books[] {\n",
    "                name\n",
    "                price\n",
    "                rating\n",
    "            }\n",
    "    }\n",
    "    \"\"\"\n",
    "\n",
    "    books = []\n",
    "\n",
    "    # Aggregate the first 50 book names, prices and ratings\n",
    "    while len(books) < 50:\n",
    "        # collect data from the current page\n",
    "        response = await page.query_data(QUERY)\n",
    "\n",
    "        # limit the total number of books to 50\n",
    "        if len(response[\"books\"]) + len(books) > 50:\n",
    "            books.extend(response[\"books\"][:50 - len(books)])\n",
    "        else:\n",
    "            books.extend(response[\"books\"])\n",
    "\n",
    "        # get the pagination info from the current page\n",
    "        pagination_info = await page.get_pagination_info()\n",
    "\n",
    "        # attempt to navigate to next page\n",
    "        if pagination_info.has_next_page:\n",
    "            await pagination_info.navigate_to_next_page()\n",
    "\n",
    "    with open(f\"./books.json\", \"w\") as f:\n",
    "        json.dump(books, f, indent=4)\n",
    "\n",
    "    # Display the video\n",
    "    video_files = [f for f in os.listdir(video_dir) if f.endswith('.webm')]\n",
    "    if video_files:\n",
    "        video_path = os.path.join(video_dir, video_files[0])\n",
    "        with open(video_path, 'rb') as f:\n",
    "          video_bytes = f.read()\n",
    "          \n",
    "        video_b64 = b64encode(video_bytes).decode('utf-8')\n",
    "        video_html = f\"\"\"\n",
    "        <video width=\"800\" controls>\n",
    "          <source src=\"data:video/webm;base64,{video_b64}\" type=\"video/webm\">\n",
    "          Your browser does not support the video tag.\n",
    "        </video>\n",
    "        \"\"\"\n",
    "        display(HTML(video_html))\n",
    "    else:\n",
    "        print(\"No video file was created\")"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
